forward message
Approximate Message Passing for Bayesian Neural Networks
Sommerfeld, Romeo, Helms, Christian, Herbrich, Ralf
Bayesian neural networks (BNNs) offer the potential for reliable uncertainty quantification and interpretability, which are critical for trustworthy AI in high-stakes domains. In this work, we advance message passing (MP) for BNNs and present a novel framework that models the predictive posterior as a factor graph. To the best of our knowledge, our framework is the first MP method that handles convolutional neural networks and avoids double-counting training data, a limitation of previous MP methods that causes overconfidence. We evaluate our approach on CIFAR-10 with a convolutional neural network of roughly 890k parameters and find that it can compete with the SOTA baselines AdamW and IVON, even having an edge in terms of calibration. On synthetic data, we validate the uncertainty estimates and observe a strong correlation (0.9) between posterior credible intervals and its probability of covering the true data-generating function outside the training range. While our method scales to an MLP with 5.6 million parameters, further improvements are necessary to match the scale and performance of state-of-the-art variational inference methods. Deep learning models have achieved impressive results across various domains, including natural language processing (Vaswani et al., 2023), computer vision (Ravi et al., 2024), and autonomous systems (Bojarski et al., 2016). Yet, they often produce overconfident but incorrect predictions, particularly in ambiguous or out-of-distribution scenarios. Without the ability to effectively quantify uncertainty, this can foster both overreliance and underreliance on models, as users stop trusting their outputs entirely (Zhang et al., 2024), and in high-stakes domains like healthcare or autonomous driving, its application can be dangerous (Henne et al., 2020). To ensure safer deployment in these settings, models must not only predict outcomes but also express how uncertain they are about those predictions to allow for informed decision-making. Bayesian neural networks (BNNs) offer a principled way to quantify uncertainty by capturing a posterior distribution over the model's weights, rather than relying on point estimates as in traditional neural networks. This allows BNNs to express epistemic uncertainty, the model's lack of knowledge about the underlying data distribution.
Block-local learning with probabilistic latent representations
Kappel, David, Nazeer, Khaleelulla Khan, Fokam, Cabrel Teguemne, Mayr, Christian, Subramoney, Anand
The ubiquitous backpropagation algorithm requires sequential updates through the network introducing a locking problem. In addition, back-propagation relies on the transpose of forward weight matrices to compute updates, introducing a weight transport problem across the network. Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process. We propose a new method to address both these problems and scale up the training of large models. Our method works by dividing a deep neural network into blocks and introduces a feedback network that propagates the information from the targets backwards to provide auxiliary local losses. Forward and backward propagation can operate in parallel and with different sets of weights, addressing the problems of locking and weight transport. Our approach derives from a statistical interpretation of training that treats output activations of network blocks as parameters of probability distributions. The resulting learning framework uses these parameters to evaluate the agreement between forward and backward information. Error backpropagation is then performed locally within each block, leading to "block-local" learning. Several previously proposed alternatives to error backpropagation emerge as special cases of our model. We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning. These results provide a new principled framework for training networks in a distributed setting.
WhatsApp: What do app's new forwarding rules mean โ and why has it changed how you forward messages?
WhatsApp has introduced major new changes that are intended to stop people spreading messages so easily. The update is intended to stop the spread of false stories, bad advice and misinformation about the coronavirus outbreak as well as more generally. The feature might initially appear strange for a social network, which usually encourage easy ways of forwarding on messages and increase the reach of more popular posts. But WhatsApp has said the feature is specifically intended to "constrain virality", to keep the app "personal and private" as well as looking to slow the spread of hoaxes and rumours. What do the new rules mean?
Weakly-supervised Dictionary Learning
You, Zeyu, Raich, Raviv, Fern, Xiaoli Z., Kim, Jinsub
We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.
Increased Privacy with Reduced Communication in Multi-Agent Planning
Maliah, Shlomi (Ben-Gurion University of the Negev) | Brafman, Ronen I. (Ben-Gurion University of the Negev) | Shani, Guy (Ben-Gurion University of the Negev)
Multi-agent forward search (MAFS) is a state-of-the-art privacy-preserving planning algorithm. We describe a new variant of MAFS, called multi-agent forward-backward search (MAFBS) that uses both forward and backward messages to reduce the number of messages sent and obtain new privacy properties. While MAFS requires agents to send a state s produced by an action a to all agents that can apply any action in s, MAFBS sends such messages forward only to agents that have an action that requires one of the effects of a. To achieve completeness, it sends messages backward to agents that can supply a missing precondition. This more focused message passing scheme reduces states exchanged, and requires that agents be aware only of other agents that they directly interact with, leading to agent privacy.
A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference
Alaa, Ahmed M., van der Schaar, Mihaela
Modeling continuous-time physiological processes that manifest a patient's evolving clinical states is a key step in approaching many problems in healthcare. In this paper, we develop the Hidden Absorbing Semi-Markov Model (HASMM): a versatile probabilistic model that is capable of capturing the modern electronic health record (EHR) data. Unlike existing models, the HASMM accommodates irregularly sampled, temporally correlated, and informatively censored physiological data, and can describe non-stationary clinical state transitions. Learning the HASMM parameters from the EHR data is achieved via a novel forward-filtering backward-sampling Monte-Carlo EM algorithm that exploits the knowledge of the endpoint clinical outcomes (informative censoring) in the EHR data, and implements the E-step by sequentially sampling the patients' clinical states in the reversetime direction while conditioning on the future states. Real-time inferences are drawn via a forward-filtering algorithm that operates on a virtually constructed discrete-time embedded Markov chain that mirrors the patient's continuous-time state trajectory. We demonstrate the prognostic utility of the HASMM in a critical care prognosis setting using a real-world dataset for patients admitted to the Ronald Reagan UCLA Medical Center. In particular, we show that using HASMMs, a patient's clinical deterioration can be predicted 8-9 hours prior to intensive care unit admission, with a 22% AUC gain compared to the Rothman index, which is the state-of-the-art critical care risk scoring technology.
Inference in Hidden Markov Models with Explicit State Duration Distributions
Dewar, Michael, Wiggins, Chris, Wood, Frank
Hidden Markov models (HMMs) are a fundamental tool for data analysis and exploration. Many variants of the basic HMM have been developed in response to shortcomings in the original HMM formulation [9]. In this paper we address inference in the explicit state duration HMM (EDHMM). By state duration we mean the amount of time an HMM dwells in a state. In the standard HMM specification, a state's duration is implicit and, a priori, distributed geometrically. The EDHMM (or, equivalently, the hidden semi-Markov model [12]) was developed to allow explicit parameterization and direct inference of state duration distributions. EDHMM estimation and inference can be performed using the forward-backward algorithm; though only if the sequence is short or a tight "allowable" duration interval for each state is hard-coded a priori [13]. If the sequence is short then forward-backward can be run on a state representation that allows for all possible durations up to the observed sequence length. If the sequence is long then forward-backward only remains computationally tractable if only transitions between durations that lie within pre-specified allowable intervals are considered.